Compiler-Assisted Dynamic Predicated Execution of Complex Control-Flow Structures

نویسندگان

  • Hyesoon Kim
  • José A. Joao
  • Onur Mutlu
  • Yale N. Patt
چکیده

Even after decades of research in branch prediction, branch predictors still remain imperfect, which results in significant performance loss in aggressive processors that support large instruction windows and deep pipelines. This paper proposes a new processor architecture for handling hard-to-predict branches, the diverge-merge processor. The goal of this paradigm is to eliminate branch mispredictions due to hard-to-predict dynamic branches by dynamically predicating them. To achieve this without incurring large hardware cost and complexity, the compiler identifies branches that are suitable for dynamic predication called diverge branches. The compiler also selects a control-flow merge (or reconvergence) point corresponding to each diverge branch to aid dynamic predication. If a diverge branch is hard-to-predict at run-time, the microarchitecture dynamically predicates the instructions between the diverge branch and the corresponding merge point by first executing one path after the branch, then executing the other path, and later merging the data-flow produced by the two paths using special select-uop instructions. The control-flow merge point is selected based on the frequently-executed paths in the program using profile information. Therefore, the control-flow from a diverge branch does not have to merge (but it usually does), which allows the dynamic predication of a much larger set of branches than simple hammock (if-else) branches . Our evaluations show that a diverge-merge processor outperforms a baseline with an aggressive branch predictor by 10.8% on average over 15 SPEC CPU2000 benchmarks, through an average reduction of 31% in pipeline flushes due to branch mispredictions. Furthermore, the proposed mechanism outperforms a previously-proposed dynamic predication mechanism that can predicate only simple hammock branches by 7.8%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicate-Based Transformations to Eliminate Control and Data-Irrelevant Cache Misses

The performance of modern processors is increasingly dependent on their ability to execute multiple instructions per cycle. Explicitly Parallel Instruction Computing (EPIC) architectures can achieve high performance by using the compiler to express program instruction level parallelism (ILP) directly to the hardware. The predicated execution feature is critical to the success of the EPIC archit...

متن کامل

Speculative pre-execution assisted by compiler (SPEAR)

Speculative pre-execution achieves efficient data prefetching by running additional prefetching threads on spare hardware contexts. Various implementations for speculative pre-execution have been proposed, including compiler-based static approaches and hardware-based dynamic approaches. A static approach defines the p-thread at compile time and executes it as a stand-alone running thread. There...

متن کامل

A Comparison of Full and Partial Predicated Execution

One can eeectively utilize predicated execution to improve branch handling in instruction-level parallel processors. Although the potential beneets of predicated execution are high, the tradeoos involved in the design of an instruction set to support predicated execution can be diicult. On one end of the design spectrum, architectural support for full pred-icated execution requires increasing t...

متن کامل

Support for Software Assisted Speculative Execution

Computer architects strive to improve machine performance by exploiting parallelism, but control flow and data dependences limit available parallelism. Speculative execution enhances parallelism by selectively ignoring the constraints of control flow and data dependences, thereby executing instructions before it it known whether they are needed or correct. Software assisted speculative executio...

متن کامل

ISCA - 22 , Jun 1995 1 A Comparison of Full and Partial Predicated Execution Supportfor ILP

One can eeectively utilize predicated execution to improve branch handling in instruction-level parallel processors. Although the potential beneets of predicated execution are high, the tradeoos involved in the design of an instruction set to support predicated execution can be diicult. On one end of the design spectrum, architectural support for full pred-icated execution requires increasing t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006